Two data sets (and a county shapesfile) were used to make the following visualizations:
The Statlog dataset was originally accessed through the UCI repository. It currently appears to be broken/missing. The first (Kaggle) link above for the Statlog heart disease dataset does currently work.
The below plot is static and has been reduced in size. For a larger and interactive version, scroll down.
Note: The interactivity for the above plot works offline but does not render correctly in GitHub.
The above plot shows the estimated total pesticide use by Florida counties in 2015. The top 20 counties are in red and the bottom in blue. Unsurprisingly, highly agricultural counties use more pesticides than others. This was confirmed by referencing the 2017 agricultural market values from the Florida Department of Agriculture and Consumer Services.
The above map shows the same information as the first visualization, but we are now able to recognize areas of Florida that may be difficult to distinguish in a list of county names.
There is little to no pesticide use in the
There is also very little use in the pan handle near Apalachicola National Forest.
Most of the pesticide use is in central to south-central Florida. Quite likely this map indicates the density of farmland in Florida counties. Some interesting extensions would be to:
For the third visualization, the Statlog Heart Disease data set was used (the same data set used in mini-project 1). A linear model was made to predict heart disease (positive versus negative) using all thirteen available predictors. In the above coefficients plot, we see that
are the most significant predictors.
Interestingly,
appear to have very little predictive power.
The linear model created was very simple and is not recommended for prediction purposes. However, the coefficients plot can be used in additional analysis and feature selection.